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Abstract — We apply large deviations theory to study 
asymptotic performance of running consensus distributed 
detection in sensor networks. Running consensus is a 
stochastic approximation type algorithm, recently pro- 
posed. At each time step k, the state at each sensor is 
updated by a local averaging of the sensor's own state and 
the states of its neighbors (consensus) and by accounting 
for the new observations (innovation). We assume Gaussian, 
spatially correlated observations. We allow the underlying 
network be time varying, provided that the graph that 
collects the union of links that are online at least once 
over a finite time window is connected. This paper shows 
through large deviations that, under stated assumptions 
on the network connectivity and sensors' observations, the 
running consensus detection asymptotically approaches in 
performance the optimal centralized detection. That is, 
the Bayes probability of detection error (with the running 
consensus detector) decays exponentially to zero as k — >• oo 
at the Chernoff information rate-the best achievable rate 
of the asymptotically optimal centralized detector. 



I. Introduction 

We apply large deviations to study the asymptotic 
performance of distributed detection in sensor networks. 
Each node in the network senses the environment and 
cooperates locally with its neighbors to decide between 
the two hypothesis, Hi and Hq. The nodes are connected 
by a generic, time varying network, and there is no fusion 
center. Specifically, we consider distributed detection 
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via running consensus 1 that has been recently proposed 
in [2]. With running consensus, at each time k, N 
nodes update their decision variables by: 1) incorporating 
new observation (innovation step); and 2) mixing their 
decision variables locally with the neighbors (consensus 
step). 

We allow the underlying communication graph be 
(deterministically) time varying; but we assume that 
the graph that collects all communication links that are 
online (at least once) within a finite time window B 
is connected. We assume Gaussian, spatially correlated, 
time-uncorrelated sensors' observations. Under stated 
assumptions on the network connectivity and the sen- 
sors' observations, we show that the running consensus 
distributed detector is asymptotically optimal, as the 
number of observations k goes to infinity. That is, the 
running consensus distributed detector asymptotically 
approaches the performance of the optimal centralized 
detector. We apply large deviations to study the asymp- 
totic performance of both the (asymptotically) optimal 
centralized detector, which collects observations from 
all nodes i at each time k, and the running consensus 
detector. For both detectors, the Bayes probability of 
error decays as e~ kc , where C is the Chernoff distance 
between the distributions of the N x 1 observation 
vectors under the two hypothesis, i.e., the Chernoff 
information. 

We now briefly review the existing work on distributed 
detection. Distributed detection has been extensively 
studied. Prior work studies parallel fusion architectures 
(see, e.g., [3], [4], [5], [6], [7], [8]) where all nodes 
communicate with a fusion node. Also, consensus-based 
detection schemes have been studied (with no fusion 
node) in, for example, [9], [10], [11], where nodes in the 
network: 1) collect measurements; and 2) subsequently 
run the consensus algorithm to fuse their detection rules. 
The running consensus distributed detection has been 
proposed in [12]. Running consensus is different from 
classical consensus detection, as it incorporates new 
observations at each time step fc, in real time; thus, 

'The running consensus algorithm is a type of recursive stochastic 
approximation algorithm, see, e.g., [1]. Reference [1] studies more 
general stochastic approximation type algorithms in the context of 
distributed estimation. We use the algorithm in form given in [2] and 
will refer to it as running consensus. 



unlike classical consensus, no delay is introduced from 
collecting observations to reaching consensus. 

We now comment on the differences between this 
paper and reference [12], which also studies asymp- 
totic optimality of distributed detection via running con- 
sensus. Reference [12] considers the Neyman-Pearson 
framework, while we adopt the Bayesian framework. 
Reference [12] considers that, as the number of ob- 
servations k grows, the distribution means under the 
two hypothesis become closer and closer, at the rate of 
1 / Vfc; consequently, as k — > oo, there is an asymptotic, 
non zero, probability of miss, and asymptotic, non zero, 
probability of false alarm. In contrast, we assume that 
the distributions do not change with k (do not approach 
each other,) and the Bayes probability of error decays to 
zero; we then examine the rate of decay of the Bayes 
error probability. Further, reference [12] assumes that 
the observations at different sensors are independent 
identically distributed, with generic distribution, while 
we assume Gaussian; however, we allow for spatial 
correlation among observations-a well-suited assump- 
tion, e.g., for densely deployed wireless sensor networks 
(WSNs). Reference [12] studies the case where the 
underlying network is randomly varying; we consider 
deterministically time varying network. 
Paper organization. Section II reviews the large de- 
viations results and the Chernoff lemma in hypothesis 
testing. Section III explains data and network models 
that we assume. Section IV introduces the (asymptot- 
ically) optimal centralized detection, as if there was a 
fusion node and its detection performance. Section V 
shows that the distributed running consensus detector 
asymptotically approaches in performance the optimal 
centralized detector. Finally, section VI summarizes the 
paper. 

II. Background 

In this section, we briefly review standard large devia- 
tions analysis for binary hypothesis testing and standard 
asymptotic results (in particular, Chernoff lemma) in 
binary hypothesis testing. We will later use these results 
throughout the paper. 

A. Binary hypothesis testing problem: Log-likelihood 
ratio test 

Consider the sequence of independent identically dis- 
tributed (i.i.d.) d-dimensional random vectors (observa- 
tions) y(k), k = 1,2, and the binary hypothesis test- 
ing problem of deciding whether the probability measure 
(law) generating y(k) is v a (under hypothesis H ) or v\ 
(under Hi). Assume that vi and v Q are mutually ab- 
solutely continuous, distinguishable measures. Based on 
the observations y(l), ...,y(k), formally, a decision test 
T is a sequence of maps T k : R kd -> {0, 1}, k = 1, 2, 



with the interpretation that Tk(y(l), y(k)) = I means 
that Hi is decided, I = 0,1. Specifically, consider the 
log-likelihood ratio (LLR) test to decide between H n 
and Hi, where Tk is given as follows: 



V(k) : iVlog^ (,(,)) 



3 = 1 



dvi 
dv 



= Is 



(1) 



(2) 



'-{T>(k)>-y k }- 

Here L(k) := log^(y(fc)) is the LLR (given by 
the Radon-Nikodym derivative of Vi with respect to v 
evaluated at y(k)), jk is a chosen threshold, and Ia is 
the indicator of event A. The LLR test with threshold 
7fe = 0, Vfc, is asymptotically optimal in the sense 
of Bayes probability of error decay rate, as will be 
explained in next subsection (II-B). 

B. Log-likelihood ratio test: Large deviations 

This subsection studies large deviations for the LLR 
decision test with decision variables T>(k) given in 
eqn. (1). The large deviations analysis will be very 
useful in estimating the exponential rate at which the 
Bayes probability of error decays and in showing the 
asymptotic optimality of the distributed running consen- 
sus detector. We first give the definition of the large 
deviations principle [13]. 

Definition 1 (Large deviations principle (LDP)) 
Consider a sequence of real valued random variables 
{@(*0}£Li : = {©( fc )} and denote by 6 k the probability 
measure of 0(fc). We say that the sequence of 
measures {9k} satisfies the LDP with a rate function 
J : M -> R U {+00} if the following holds: 

1) For any closed, measurable set Fcl: 

lim sup \ log 6 k (F) < - inf J(t) 

2) For any open, measurable set G C R: 

lim inf y log9 k (G) > - inf J(t). 

fe^oo k tea 

It can be shown that the sequence of LLR's {L(k)\, 
conditioned on Hi, I = 0, 1, is i.i.d. Denote by /z^ 
the probability measure of V(k) under hypothesis Hi. 
Using Cramer's theorem ([13]), it can be shown that the 
sequence of measures {/4°}, I = 0, 1, satisfies the LDP 
with good 2 rate function: 



AM = sup (At-A (J) (A)), 



(3) 



AeR 



where Am(-) is the log-moment generating function of 
L(k) under hypothesis He 



A (;) (A) =logE 



(4) 



2 Goodness of rate function is compactness of its sublevel sets. 



That is, the rate function A*^ (t) is the Fenchel-Legendre 
(F-L) ([13]) transform of the log-moment generating 
function of L(k) under Hi. It can be shown that 
A-rn(i) = A* )(t) — t. We summarize this result in the 
following theorem, e.g., [13]: 

Theorem 2 The sequence of measures {fjft} of T){k) 
under Hi satisfies the LDP with good rate function given 
by eqn. (3). 

C. Asymptotic Bayes detection performance: Chernoff 
lemma 

We adopt the Bayes minimum probability of error 
detection. Denote by P e (k) the Bayes probability of 
error after k samples are processed: 

P e (k)=P(H )a(k)+P(H 1 )(3(k), (5) 

where P (Hi) are the prior probabilities, a(k) := 
P (V(k) > lk \H ) and (3(k) := P (V(k) < 7fc |JTi) are, 
respectively, the probability of false alarm and the prob- 
ability of miss, and 7 fc is the test threshold. 

We will be interested in the rate at which the Bayes 
probability of error decays to zero as the number of 
observations k goes to infinity. Also, as auxiliary results, 
we will need the rates at which a(k) and f3(k) go to zero 
as k — > oo. That is, we will be interested in the following 
quantities: 

lim^oo ilogP e (fc) (6) 
Hindoo jj-loga(fc) (7) 
linifc^oo ilog/3(fc). (8) 

Theorem 4 ([13]) states that, among all possible decision 
tests, the LLR test with zero threshold minimizes (6). 
This result is a corollary of the Theorem 3 ([13]), that 
asserts that, for a LLR test with fixed threshold "f k = 7, 
a(k) and fi(k) indeed (simultaneously) decay to zero 
exponentially; also, Theorem 3 expresses the exponential 
rate of decay in terms of the rate functions defined in 
eqns. (7) and (8). Before stating the Theorem, define 
L {1) :=E(L(fc)|£Ti). * = 0,1. 

Theorem 3 The LLR test with constant threshold -fk = 

7, 7 e (L( ),£ (1) ) satisfies: 

Urn iloga(fc) - -A^ 0) ( 7 )<0 (9) 
lim ±log/3(fc) = 7-A^ 0) ( 7 )<0. (10) 

Theorem 4 (Chernoff lemma) If P(H$) G (0, 1), then: 
infliminf |ilogP e (fc)| = -Afo(0), (11) 



where the infimum over all possible tests T is attained 
for the LLR test with 7^ = 0, Vfc. 

The quantity A? 0) (0) = A? 1} (0) is called the Chernoff 
distance between the distributions of y(k) under H and 
Hi, or Chernoff information, [13]. 

Asymptotically optimal test. We introduce the follow- 
ing definition of the asymptotically optimal test. 

Definition 5 The decision test T is asymptotically opti- 
mal if it attains the infimum in eqn. (11). 

We will show that, for the distributed Gaussian hypoth- 
esis testing over time varying networks, the running 
consensus is asymptotically optimal in the sense of 
Definition 5. 

III. Distributed detection model: Data and 
Network models 

This section describes: 1) the data model (subsec- 
tion III- A), i.e., the observation model at each sensor 
in the network; and 2) the model of the network through 
which the sensors cooperate with the running consensus 
distributed detection algorithm (subsection III-B). The 
distributed detection algorithm is detailed in Section V. 

A. Data model 

We consider Gaussian binary hypothesis testing in 
spatially correlated noise. The sensors operate (in terms 
of sensing and communication) synchronously, at dis- 
crete time steps k. At time k, sensor i measures (scalar) 
Ui(k). Collect the sensor measurements in a vector 
y(k) = (yi(k),y 2 (k), ...,y N (k)) T , where N is the total 
number of sensors. Nature can be in one of two possible 
states: H\— event occurring (e.g., target present); and 
Hq— event not occurring (e.g., target absent.) We assume 
the following distribution model for the vector y(k): 

under H t : y(k) = mi + ((k), 1 = 0,1, (12) 

where mi is the (constant) signal under hypothesis 
Hi, and ((k) is zero mean Gaussian additive noise. 
We assume that {((k)} is an independent identically 
distributed (i.i.d.) sequence of JV x 1 random vectors 
with distribution ((k) ~ Af(0, S), where S is a (positive 
definite) covariance matrix. Thus, with our model, the 
noise is temporally independent, but can be spatially cor- 
related. Spatial correlation should be taken into account 
due to, for example, dense deployment of wireless sensor 
networks, while it is still reasonable to assume that the 
observations are independent along time. (Conditioned 
to Hi, {y(k)} are i.i.d. with the distribution Af(mi,S).) 



B. Network model and data mixing model 

We consider distributed detection via running consen- 
sus where each node at a time fc: 1) measures yi(k); 
2) exchanges its current decision variable (denote it by 
Xi(k)) with its neighbors; and 3) performs a weighted 
average of its own decision variable and the neighbors' 
decision variables. The network connectivity is assumed 
time varying. The weighted averaging, at each time k, 
as with the standard consensus algorithm, is described 
by the N x N weight matrix W(k). We assume W(k) 
is a symmetric, stochastic matrix (it has nonnegative 
entries and the rows sum to 1.) The weight matrix 
W(k) respects the sparsity pattern of the network, i.e., 
Wij(k) = 0, if the link {i,j} is down at time k. We 
define also the undirected graph G(k) — (V,£(fc)), 
where V is the set of nodes with cardinality |V| = N, 
and £{k) is the set of undirected edges that are online at 
time k. Formally, £{k) = {{i,j} ■ i < j, W l3 {k) > 0}. 
Define also J := (1/7V)11 T , where 1 is N x 1 vector 
with unit entries. We now summarize the assumptions 
on the matrices {VT(fc)} and the graphs G(k): 

Assumption 6 For the sequence of matrices {VT(fc)} = 
{W{k)}^L 1 , we assume the following: 

1) W(k) is symmetric and stochastic, Vfc. 

2) There exists a scalar W m i n € (0, 1), such that ;) 
W u (k) > W min , Mi, Vfc; and ii) Vfc, W t] {k) > 
W min , if i ^ j and {i,j} e £{k). 

3) There exists an integer 1 < B < +oo, such that, 
Vfc, the graph (V, (f)) is connected. 

Assumption 6-3) says that nodes should communicate 
sufficiently often (within finite time windows,) such that 
the network provides sufficiently fast information flow. 

IV. Centralized detection: Bayes optimal 

TEST 

We first consider the centralized detection scenario, as 
if there was a fusion node that collects and processes all 
sensor observations. The decision variable T>(k) and the 
LLR decision test are given by eqns. (1) and (2), where 
now, under the data assumptions in subsection III-A: 

L(k) = ( mi - mo ) T S- 1 [y{k)-^±^ (13) 

Conditioned on either hypothesis H\ and H , L(k) ~ 
J\f (m^ ,crj^j, where 

= t^H(rm -mofS-Vi-moXW) 
°i = - m ) T S~ 1 (m 1 - m ). (15) 

Define the vector v £ M. N as 

v:= S- 1 {m 1 -m Q ). (16) 



Then, the LLR L(k) can be written as follows: 

m=f:v i (ym- [mi]i+ 2 [rno]i )=f:m(k), 

i=l v z / i=1 

(17) 

where [m{\i denotes the i-th entry of vector mi, I = 
0,1. Thus, the LLR at time k is separable, i.e., the LLR 
is the sum of the terms f]i(k) that depend affinely on 
the individual observations We will exploit this 

fact in subsection V-A to derive the distributed, running 
consensus, detection algorithm. 

Applying Theorem 2 to the sequence {V(k)} (under 
hypothesis Hi, I = 0, 1), we have that the sequence 
of measures of T>(k) satisfies the LDP with good rate 
function 1^ : M -> R U {+oo}, which, by evaluating 
the log-moment generating function of L(k) in (13) and 
its F-L transform, can be shown to be: 

l i i ) (t)= {t -™p\ l = 0,l. (18) 
We state this result as a Corollary 7. 



Corollary 7 The sequence {V(k)}, under Hi, Z = 0,1, 
satisfies the LDP with good rate function I(i){-), given 
by eqn. (18). 

We remark that Theorem 4 also applies to the detec- 
tion problem explained in subsection III-A. Denote by 
P c e en (fc) the Bayes probability of error for the centralized 
detector (defined in section IV,) after k samples are pro- 
cessed. Due to the continuity of the rate functions in (18), 
it can be shown that: liminffe^oo i log P c e on (fc) = 
limsupfe^ I logP c e cn (fc) = Hindoo \ logP c e en (fc). 
Thus, Theorem 4 in this case simplifies to the following 
corollary: 



Corollary 8 (Chernoff lemma for the optimal centralized 
detector) The LLR test with -fk = 0, Vfc, is asymptoti- 
cally optimal in the sense of definition 5. Moreover, for 
the LLR test with 7^. = 0, Vfc, we have: 

lim ilogP c e cn (fc) = -/ (0) (0) (19) 

= -jK - m ) T S' _1 (mi - m ). 


Remark. The LLR test with zero threshold is optimal 
also in the finite time k regime, for all fc, in the sense 
that it minimizes the Bayes probability of error, when 
the prior probabilities are P(H Q ) = P(-Hi) = 0.5. When 
the prior probabilities are not equal, the LLR test is also 
optimal, but the threshold 7^ will be different than zero. 



V. Distributed detection algorithm 

A. Distributed detection via running consensus 

We now present a distributed detection algorithm via 
running consensus. With this detection algorithm, no 
fusion node is required, and the underlying network is 
generic, time varying. The running consensus is pro- 
posed in [2], and it is a stochastic approximation type 
of algorithm (see [1]). Reference [2] studies the case 
when the observations of different sensors at a fixed time 
k are i.i.d. We extend the running consensus detection 
algorithm to the case of spatially correlated Gaussian 
observations. 

With the running consensus distributed detector, each 
node i makes local decisions based on its local decision 
variable Xi(k): If Xi(k) > 0, then Hi is accepted; if 
Xi{k) < 0, then H is accepted. At each time step k, 
the local decision variable at node i is improved two- 
fold: 1) by exchanging information with its immediate 
neighbors in the network; 2) by incorporating into the 
decision process the new local observation yi(k). Recall 
the definition of rji(k) in eqn. (17). Specifically, the 
update of the local decision variable at node i is given 
by the following equation: 

x i (k + l) = -^W ii (k)x i (k) + (20) 



k + l 



k + l 



Xi (l) = N Vi (l) 



fc = l,2,... 



Here ili(k) is the (time varying) neighborhood of 
node i at time k, and Wij(k) are the (time varying) 
averaging weights, defined together with the N x N 
(time varying) matrices W(k) = [Wij(k)} in subsec- 
tion III-B. Let x(k) = (xi(fc), X2(fc), xjy(fc)) T and 
r](k) = (rji(k), ?7jv(fc)) T - The algorithm in matrix 
form is given by: 



x(k + l) = 



1 



W{k)x{k) 



x(l) = Nr)(l) 



-N V (k + l), (21) 



k 



1,2,... 



Recall the definition of the N x 1 vector v in (16). The 
sequence of N x 1 random vectors {r)(k)}, conditioned to 
Hi, is i.i.d. Vector r}{k) (under hypothesis Hi, I = 0, 1) 
is Gaussian with mean rr$ and covariance S v : 



m 



(-l)( (+1 >Diag(«) i( mi -m ) (22) 



(0 

S v = Diag (v) SDiag (v) . 



(23) 



Here Diag(w) is a diagonal matrix with the diagonal 
entries equal to the entries of v. 



B. Asymptotic optimality of the distributed detection 
algorithm 

In this subsection, we present our main result, which 
states that the distributed detection via running consensus 
asymptotically achieves the performance of the optimal 
centralized detector, in the sense that it approaches 
the exponential error decay rate of the (asymptotically) 
optimal centralized detector. 

Denote the probability measure of Xi(k) under hy- 
pothesis Hi with xfi- First* we show that the sequence 
of measures {Xi,k}< for all nodes i, satisfies the LDP 
with good rate function; the rate function for all nodes 
i is the same, and it is the same as the rate function of 
the optimal centralized detector in eqn. (18). 

We prove that the sequence of measures for {xi(k)} 
(under Hi, I = 0,1) satisfies the LDP using the Gartner- 
Ellis Theorem from large deviations theory, see [13]. We 
now state Theorem 9. 

Theorem 9 Let assumption 6 hold. The sequence of 
measures {x^ k }, for all nodes i, satisfies the large devia- 
tions principle with good rate function. The rate function 
is the same as for the optimal centralized detector and 
is given by Im{-) in eqn. (18). 

Before proving Theorem 9, define j), for k > j > 
1, as follows: 

$(fc, j) := W(k - l)W(k - 2)...W(j), (24) 

and remark that the algorithm in eqn. (21) can be written 

as: 

N N 
X W = j^2*(k,3)vU) + T ri{k), fc = 2,3,... (25) 



Next, recall that J = (1/N)11 T , introduce notation: 

$(fc,j) :=W(k - l)W(k - 2)...W(j), k>j>l, 

(26) 

and remark that 

= $(*;, j)-J. 

To prove Theorem 9, we borrow the following result 
(Lemma 10) on the matrices &(k,j) from reference [14] 
(Lemma 3.2). First, denote by [<f(fc, j)]u the entry in i-th 
row and Z-fh column of matrix $(fc,j). 

Lemma 10 Let Assumption 6 hold. Then, for the matri- 
ces $(fc, j), defined by eqn. (26), there holds: 



where 
1. 



max 

i,l=l,...,N 

- (-[ 



i -2 



il 

and /3 



<ep k ~ j , (27) 

l/B 



1 



AN' 2 



< 



Lemma JO says that, under Assumption 6, the size of the 
matrix $(fc, j) decays geometrically (in k — j) to zero. 
This fact will be important in showing Theorem 9. 



Proof of Theorem 9: 
quantity: 

k (0 



Define, for /i e 



the 



A^(M) := logE; [exp (/^(fc))] (28) 
= logE, [cxp (\ T x(k))] , (29) 

where A = fie,, X e K w , and E ; [a] := E [a|i? ; ], I = 
0,1. Here denotes the z-th column of N x N identity 
matrix. We drop the dependence on i in the definition 
of (/z) for notation simplicity. Recall the expressions 
for m^p and a\ in eqns. (14) and (15). We will show, 
for all (i e M., the following equality: 

1 ,(!)/■ - 1 



Hm -A^(M-§^M 2 + 4°^ 



(30) 



Consider the function ^ i->- \a\ fi 2 + mf^ fi; this 
function is essentially smooth, continuous, and its do- 
main is R; hence, by the Gartner-Ellis theorem ([13], 
Theorem 2.3.6), { X f k } (the sequence of measures of 
Xi(k) under Hi) satisfies the LDP. The corresponding 
rate function equals the F-L transform of the function 



//; and it is easy to show that 

,(') 



H i y \a\ fi 2 + 

the F-L transform of (i i-> \a\ fi 2 + m^' fi equals the 
rate function given by eqn. (18). Thus, proving 

Theorem 9 reduces to showing (30). We thus proceed 
with showing (30). Namely, we have: 

1 

k 

1 



a£°(M 



logE, 



fe-i 



exp [ NX T J2^(k,j)r 1 (j)+N\ T r 1 (k) 

3 = 1 



= -logE, 



fc-i 



exp ( NX T J2^(k,j)r](j) 



+ - logE, [exp(7VA T r;(fc))], 

where the last equality holds because i](k) is independent 
from T](j), j = l,...,k — 1. We will be interested in 
computing the limit liirifc_ ) . 00 ^A^(fc /z), for all 
with this respect, remark that 

lim ±logE, [c^(NX T V (k))] =0, 

k — >-oo ft 

for all A e K w , because /7(Ar) is a Gaussian random 
vector and hence it has finite log-moment generating 
function at any point NX. 

Thus, we have that lim^oo ^Ak(k /i) = 



linife^oo (k /i), where 

4 i) (M = logE 



fc-i 



exp ^AA T ^$(fc,. ? -)77(. ? ) 

and we proceed with the computation of C^\kfi). The 
random variables NX J $(fc, j)i](j), j — l,...,k — 1, 
are independent; moreover, they are Gaussian random 
variables, as linear transformation of the Gaussian vari- 
ables r](j). Recall that and S v denote the mean and 
the covariance of i](k) under hypothesis Hi. Using the 
independence of -q(j) and r)(s), s ^ j, and using the 
expression for the moment generating function of rj(j), 
we obtain successively: 



1 



f4°(M = flogE ; 



fc-i 



exp NX T J2^,J)V(J) 



- -logE, [njzjexp (AA T $(fc, (31) 
= ilogn^exp (jVA T $(fc,j)m«) 



exp ^-A 2 A T 0>(fc,j)5 r '$(fc,i) T A 
= ^logn^Zi 1 cxp(AA T ($(k,j) + j) m«) 
exp Q^V 2 A T ( J + $(fc, j)) 5" ( J + , ? ) T ) 



Denote further: 

S(k) 



+ 



N 

N_ 
~2k 

N 



fc-i 



A T ^$(fc, j)m (' 



(0 



(32) 



2 fc^ 1 



A T ^$(fc,j)^'<l>(fc,j) T A 



fc-i 



fe-i 



A(o(m,^) 



+ ^E^m^ja 

(fe - 1)AA T JmS, 



+ 



A 2 (fc~ 1)A T J5 T 'JA 
2fc ' 



where dependence on i7, is dropped in the definition 
of (5(fc). Then, it is easy to see that j-C^ (fc/i) = 
A(;)(/z, k) +S(k). Also, we have: lim^oo (fi, k) = 



NX 1 Jr4° + ^A 1 JS*J\ =: A (J) (/i). 



Recall the expressions for u, m^, a 2 L , m^', and S"' 



,(0 



in eqns. (16), (14), (15), (22), (23). We proceed with the Corollary 11 (Chernoff lemma for the distributed de- 

computation of Am(/i): tector: Asymptotic optimality) The local decision test 

N T k ,i := I{ Xi (k)>0}, k = 1,2,..., at each node i, is 

= ( — — M( m i — m o) Diag(w)Jei asymptotically optimal in the sense of Definition 5. 

^ 2 " The corresponding exponential decay rate of the Bayes 

+ ~2~^ 2 e ~* J S v J ei probability of error, at each node i, is given by: 

= (-l)'+ 1 i A1 l T Diag(«)(m 1 -m ) lim \ log P* dis (k) = -J (0) (0) (41) 

+ i^ 2 l T Diag(w)S'Diag(w)l = -^K - m )5 _1 (mi - m ). 

2 o 

= ( mi - mo ) T (mi - m ) Prao /' Denote by a 4 , dis (fc) and /3 i4is (k), respec- 

2 tively, the probability of false alarm and the probability 

-I- ( mi — m ) T S^ 1 (mi — mo) °f m i ss f° r me distributed detector at sensor i, i.e., 

m,, 1 2 (( 2 a 4 , dis (fc) = P(xi(fc)>0|fr )=xiS((0,+oo)) 

2 _ A,dis(fc) = P( a;i (fc)<o|^ 1 )=xS((-oo,o]). 

We proceed by showing that 5(k) — > as fe — > oo, which 

implies the equality in eqn (30). Define the quantities S, Consider now only a hdis (k) but the same applies to 

to, and b, by: A.dis(fc)- By Theorem 9, the sequence of measures \\ 1 

satisfies the LDP with good rate function /(o)(-) given 



max jnil (33) 

,/— I,...,iV 



in eqn. (18). Thus, we have the following bounds: 



to := max to 

»=i TV 



^1 (34) limsuD-loEY^ ? 



6 - J5,»"*# (35) —I, m 



limsup-log^([0,+oo))<- inf /,„,(() (42) 

fc->oo K tG[0,+oo) 



li m inf-log X ^((0,+oo))>- inf J (0) (i) (43) 

fc— »oo K t£(0,+oo) 



Due to the continuity of the function /(o)(-) (see 



Then, it can be shown that |<5(fc)| is bounded as follows: 

^l-JmV \\S>(h i\h.\ nti\ e q n - ( 18 »' the infima on the righthand sides in 

eqns. (42) and (43) are equal; it is easy to see that they 
are equal to — 7( )(0). Thus, we have: 



\ S (k)\ < — Hm£ ( max |[§(fc,j)] a | (36) 



. i fc-i 



+ ^gU^ l[ ^ (fc ' j)]d J -/ ( o,(0) < liminfIlo gX S((0, + oo)) 

A^ 3 i7V~> rT„ .i, < lim sun - lnff v^".' 



Applying Lemma 10 to (36), and using the fact that " h £^ P k logX ^ fe ([°' +00 )) 

f3 k ~ j < $ k ~i~ x , k > j, we obtain successively: < — /(o)(0). 



k — 1 

, „ , s , 6* , o , -> v-^ t. , i From the last set of inequalities we conclude that: 

\m\ < ^ (iV 2 mM + JVV&) X> '"(37) 

i=1 lim r loga 4 , dis (fc) = -J (0) (0). (44) 

2 TV 4 - fc_1 

+ — /i 2 S ^/J 2 ^--?- 1 ) (38) Similarly, it can be shown that: 

< '(tf^+^-J- (39) J&o* 10 *^) - -/(i,(0) (45) 

k 02 Ni _ 1 P = -Ao)(0). (46) 

+ ~jT ~^ & i _ p2 • ( 4 ^ Now, consider 

Letting fc +oo, we get that \5(k)\ -> 0, and hence, ^dis( fc ) = a t,dU k ) p ( H o) + A,dis(&)-P(-Hi), (47) 
<5(fc) — > 0, which establishes eqn. (30). 



for which the following inequalities hold: 

^dis(fc) < (fc) + A.dis 

of Definition 5.) ^disW > a lAis (k)P(H ). 



We are now ready to state the main result on asymp- 
totic optimality of the distributed detector (in the sense ^disW ^ a 4 , dis (fc) + ft, dis (fc) (48) 



By eqns. (48), we obtain: 
Umsupilogi^ dis (fc) = 

max | limsup - logaj >dis (fc), limsup - log/3 iidis (fc) 1 
= -'(0)(0) 

liminf 7 log dis (fc) > liminf - loga i;dis (fc) 

= -AO)(0), 

and the claim of Corollary follows. ■ 
Remarks on Corollary 11. Corollary 11 says that, for 
large k (i.e., in the asymptotic regime,) the Bayes prob- 
ability of error at each node i behaves as: -Pj e d ; s (fc) ~ 
e -fc/ ( o)(°). That is, -f > j e d i s (^), for large decays expo- 
nentially at the best possible rate, equal to the rate 7( ) (0) 
of the (asymptotically) optimal centralized detector. This 
rate does not depend on the network connectivity, pro- 
vided that the graph that collects all the links that are 
online (at least once) within finite time window (of 
length B) is connected (see Assumption 6.) Intuitively, 
an arbitrary time varying network, whose nodes commu- 
nicate sufficiently often (within finite length time win- 
dows,) provides sufficient information flow to achieve 
asymptotic optimality. 

We now comment on the non asymptotic finite 
time regime. To this end, remark that P? dis (k) can 
be expressed as: P? dis (fc) = F i (k)e~ kI (°> ( ' a \ where 
linifc^oo i log Fi(k) = (and thus, Fj(fc) has no effect 
when k grows large.) The sequence {Fi(k)} plays a 
role in a finite time regime; it clearly depends on the 
network connectivity and can be, in general, different 
for different sensors. Analysis (by simulation) of the 
finite time regime is, due to the lack of space, omitted, 
and is pursued elsewhere. We briefly comment here that 
our numerical experience suggests that, in the finite time 
regime, the sequence Fi{k) does not have a very large 
effect. The best distributed sensor-detector, among all TV 
sensors, is typically close in performance to the optimal 
centralized detector, in the finite time regime also. 

VI. Summary 

We applied large deviations theory to analyze the 
performance of the running consensus distributed de- 
tection algorithm. We considered spatially correlated 
Gaussian noise and time varying networks. With running 
consensus, the state at each node is updated at each time 
step by: 1) exchanging information with the immediate 
neighbors in the network; and 2) incorporating into the 
decision process new local observations. We allowed the 
underlying network be time varying, provided that the 
graph that collects all the links that are at least once 
online within a finite time window is connected. We 



showed that, under spatially correlated Gaussian noise 
and stated network connectivity assumptions, the run- 
ning consensus asymptotically approaches the optimal 
centralized detector. That is, the Bayes probability of 
detection error at each sensor decays exponentially at 
the best achievable rate, the Chernoff information rate. 
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